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Abstract 

Using the theory of group action, we first introduce the concept of the automorphism group of an expo- 
nential family or a graphical model, thus formalizing the general notion of symmetry of a probabilistic model. 
This automorphism group provides a precise mathematical framework for lifted inference in the general ex- 
ponential family. Its group action partitions the set of random variables and feature functions into equivalent 
classes (called orbits) having identical marginals and expectations. Then the inference problem is effectively 
reduced to that of computing marginals or expectations for each class, thus avoiding the need to deal with each 
individual variable or feature. We demonstrate the usefulness of this general framework in lifting two classes 
of variational approximation for MAP inference: local LP relaxation and local LP relaxation with cycle con- 
straints; the latter yields the first lifted inference that operate on a bound tighter than local constraints. Initial 
experimental results demonstrate that lifted MAP inference with cycle constraints achieved the state of the 
art performance, obtaining much better objective function values than local approximation while remaining 
relatively efficient. 



1 Introduction 

Classical approaches to probabilistic inference - an area now reasonably well understood - have traditionally 
exploited low tree-width and sparsity of the graphical model for efficient exact and approximate inference. A 
more recent approach known as lifted inference J21 [T2] |6] [71 has demonstrated the possibility to perform very 
efficient inference in highly-connected, but symmetric models such as those arising in the context of relational 
(or first-order) probabilistic models. While it is clear that symmetry is the essential element in lifted inference, 
there is currently no formally defined notion of symmetry of a probabilistic model, and thus no formal account 
of what "exploiting symmetry" means in lifted inference. 

The mathematical formulation of symmetry of an object is typically defined via a set of transformations that 
preserve the object of interest. Since this set forms a mathematical group (so-called the automorphism group of 
that object), the theory of groups and group action are essential in the study of symmetry. 

In this paper, we first introduce the concept of the automorphism group of an exponential family or a graph- 
ical model, thus formalizing the notion of symmetry of a general graphical model. This automorphism group 
provides a precise mathematical framework for lifted inference in graphical models. Its group action parti- 
tions the set of random variables and feature functions into equivalent classes (a.k.a. orbits) having identical 
marginals and expectations. The inference problem is effectively reduced to that of computing marginals or 
expectations for each class, thus avoiding the need to deal with each individual variable or feature. We demon- 
strate the usefulness of this general framework in lifting two classes of variational approximation for MAP 
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inference: local LP relaxation and local LP relaxation with cycle constraints; the latter yields the first lifted 
inference that operates on a bound tighter than local constraints. Initial experimental results demonstrate that 
lifted MAP inference with cycle constraints achieved the state of the art performance, obtaining much better 
objective function values than local approximation while remaining relatively efficient. 



2 Background on Groups and Graph Automorphisms 

A partition A = {Ax . . . Aj,} of a set V is a set of disjoint nonempty subsets of V whose union is V. Each 

element Aj is called a cell. A partition A defines an equivalence relation on V, denoted as ~, by letting u ~ v 
iff u and v are in the same cell. A partition A is finer than A if every cell of A is a subset of some cell of A. 

We now briefly review some important concepts in group theory and graph automorphisms [5]. 

A group (G, •) is a non-empty set G with a binary operation ■ such that it is associative, closed in G; G 
contains an identity element, denoted as 1, such that V<? £ G, 1 ■ g = g ■ 1 = g and there exists an element g^ 1 
such that g ■ g^ 1 = g^ 1 ■ g = 1. A group containing 1 as its only element is called a trivial group. A subgroup 
of G is a subset of G that forms a group with the same binary operation as G. We write Gi < G2 when Gi is a 
subgroufQof G2. 

A permutation on a set V is a bijective mapping from V to itself. The set of all permutations of V together 
with the mapping-composition operator forms a group named the symmetric group S(V). A symmetric group 
that plays a central role in this paper is the symmetric group §„, the set of all permutations of {1, 2, . . . , n}. For 
a permutation it £ §„, ir(i) is the image of i under ir. For each vector x £ X n , the vector x permuted by 7r, 
denoted by x* , is (x^m ■ ■ ■ x^r n \); for a set A C X n , the set A permuted by 71", denoted by A* is {x^lx £ A}. 

The action of a group G on a set V is a mapping that assigns every g £ G to a permutation on V, denoted 
as g() : V —> V such that the identity element 1 is assigned to the identity permutation, and the group product 
of two elements g\ ■ gi is assigned to the composition gi() o g 2 (). The action of a group G on V induces an 
equivalence relation on V defined as v ~ v' iff there exists g £ G such that g(v) = v' (the fact that ~ is an 
equivalence relation follows from the definition of group). The group action therefore induces a partition on V 
called the orbit partition, denoted as Orbc(^). The orbit of an element v € V under the action of G is the set 
of elements in V equivalent to v: orbe(u) = {v' € V| v' ~ v}. Any subgroup Gi < G will also act on V and 
induces a finer equivalence relation (and hence a more refined orbit partition). Given v € V, if under the group 
action, every element g £ G preserves v, that is Vg € G, g(v)=v, then the group G is said to stabilize v. 

Next, we consider the action of a permutation group on the vertex set of graph, which leads to the concept 
of graph automorphisms. 

An automorphism of a graph © on a set of vertices V is a permutation ir £ S(V) that permutes the vertices 
of & but preserves the structure (e.g., adjacency, direction, color) of ©. The set of all automorphisms of © forms 
a group named the automorphism group of ©, denoted as A(©). It is clear that A(©) is a subgroup of S(V). 
The cardinality of A(©) indicates the level of symmetry in ©; if A(©) is the trivial group then © is asymmetric. 

The action of A(©) on the vertex set V partitions V into the node-orbits Orb^gj (V) where each node orbit 
is a set of vertices equivalent to one another up to some node relabeling. Furthermore, A(©) also acts on the 
set of graph edges E by letting 7r({u, v}) = {tt(u), tt(v )} and this action partitions E into a set of edge-orbits 

Orb&(©)(-E). Similarly, we also obtain the set of arc-orbits Ot\>u&\{E). 

Computing the automorphism group of a graph is as difficult as determining whether two graphs are iso- 
morphic, a problem that is known to be in NP, but for which it is unknown whether it has a polynomial time 
algorithm or is NP-complete. In practice, there exists efficient computer programs such as naut^ [8 1 for com- 
puting automorphism groups of graphs. 

'We use the notation Gi ^ G2 to mean Gi is isomorphic to a subgroup of G2- 
2 http://cs. anu.edu.au/people/bdm/nauty/ 
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3 Symmetry of the Exponential Family 



3.1 Exponential Family and Graphical Model 

Consider an exponential family over n random variables (xj)jgy where V = {1 . . . n}, Xi £ X with density 
function 

F(x | 8) = h(x) exp (($(x), 6) - A(6)) 

where h is the base density, = (<fij(x))j e z, 1 = {1,2,..., m} is an m-dimensional feature vector, 

8 £ W n is the natural parameter, and A(8) the log-partition function. Let = {8 \A(8) < oo} be the set of 
natural parameters, Ai = {fi € M m | 3p, fJ, — E p &(x)} the set of realizable mean parameters, A* : Ai — > M 
the convex dual of A, and m : — >• Ai the mean parameter mapping that maps 8 M> m(8) = Eg$(x). Note 
that m(0) = ri is the relative interior of Ai. For more details, see fl5l . 

Often, a feature function <\>i depends only on a subset of the variables in V. In this case we will write <j>i more 
compactly in factorized form as 4>i(x) = U(xi 1 . . . Xi K ) where the indices ij are distinct, i\ <i%.,.< %k, and 
ii cannot be reduced further, i.e., it must depend on all of its arguments. To keep track of variable indices of 
arguments of fj, we let scope(ii) denote its set of arguments, r]i(k) — if. the fc-th argument and | -77^ | its number 
of arguments. Factored forms of features can be encoded as a hypergraph Q [F] of F (called the graph structure 
or graphical model of F) with nodes V, and hyperedges (clusters) {C|3i, scope(ii) = C}. For models with 
pairwise features, Q is a standard graph. 

For discrete random variables (i.e., X is finite), we often want to work with the overcomplete family T° that 
we now describe for the case with pairwise features. The set of overcomplete features 1° are indicator functions 
on the nodes and edges of the graphical model Q of T\ <\>° u . t {x) = I {x u = t} , t E X for each node u £ V(Q); 
and <jft u . t v . t '}( x ) = = t,x v — t'} ,t,t' £ X for each edge {u,v} £ E(Q). The set of overcomplete 

realizable mean parameters Ai° is also called the marginal polytope since the overcomplete mean parameter 
corresponds to node and edge marginal probabilities. Given a parameter 8, the transformation of F(x\8) to its 
overcomplete representation is done by letting 8° be the corresponding parameter in the overcomplete family: 
K-.t = E ls ,. scop e( fl )={u} f *( f )^ and (assuming u < v) 8° {u . t v . t , } = £ is . t . acope{u) = {u<v} fi(i, h is 
straightforward to verify that ^"(x^ ) = F(x\8). 

3.2 Automorphism Group of an Exponential Family 

We define the symmetry of an exponential family T as the group of transformations that preserve T (hence 
preserve h and <£>). The kind of transformation used will be a pair of permutations (tt, 7) where tt permutes the 
set of variables and 7 permutes the feature vector. 

Definition 3.1. An automorphism of the exponential family F is a pair of permutations (tt, 7) where tt £ S n , 
7 £ $ m such that for all vectors x: h(x v ) = h[x) and <£> 7 (i 1 ) = $(x) (or equivalently, ^(x' R ) = $ 7 (x)). 

It is straightforward to show that the set of all automorphisms of F, denoted by A[F], forms a subgroup of 
S n x S m . This group acts on 1 by the permuting action of 7, and on V by the permuting action of tt. In the 
remainder of this paper, h is always a symmetric function (e.g., h = 1); therefore, the condition h^ 71 ) = h(x) 
automatically holds. 

Example. Let V = {1, 2, 3} and $ = {fi, f 2 , i^} where t\{x\,X2) = x\(l — x 2 ), fai^i, #3) = — x 3 ), and 
hi x 2, X3) = x 2 x 3 . The pair of permutations (tt, 7) where tt = (1 1, 2 M> 3, 3 n- 2) and 7 = (1 n- 2, 2 n- 
1, 3 h-» 3) is an automorphism of J 7 , since $ 7 1 (x 7r ) — [foi^i, ^3, x 2 ), 4>\(xi, x 3 , x 2 ), 4>i(xi, x 3l x 2 )) = 
{h{xi,X2),h{xi,Xs), £3(^3, x 2 )) = (xi(l - x 2 ),a;i(l - x 3 ),x 3 x 2 ) = $(x 1 ,x 2 x 3 ). 

An automorphism (tt, 7) can be characterized in terms of the factorized features as follow. 

Proposition 3.2. (tt, 7) is an automorphism of F if and only if the following conditions are true for all i £ X: 
(1) \rji\ — |r? 7 (j)|; (2) tt is a bijective mapping from scope(U) to scope^y^); (3) let a = o tt o r/i then 
a £ §1^1 andii(t a ) = i l(i) (t) for allt£ A^L 

Remark. Consider automorphisms of the type (1,7): 7 must permute between the features having the same 
scope: scope(fi) = scope(f 7 ( i )). Thus if the features do not have redundant scopes (i.e., scope(fj) 7^ scope(ij) 
when i ^ j) then 7 must be 1. More generally when features do not have redundant scopes, tt uniquely 
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determines 7. Next, consider automorphisms of the type (ir, 1): it must permute among variables in a way that 
preserve all the features f,-. Thus if all features are asymmetric functions then ir must be 1; more generally, 7 
uniquely determines ir. As a consequence, if the features do not have redundant scopes and are asymmetric 
functions then there exists a one-to-one correspondence between ir and 7 that form an automorphism in A [J 7 ]. 

An automorphism defined above preserves a number of key characteristics of the exponential family T (such 
as its natural parameter space, its mean parameter space, its log-partition function), as shown in the following 
theorem. 

Theorem 3.3. If{ir,i) G A[J] then 

1. 7r e A(Q[J 7 }), i.e. 7r is an automorphism of the graphical model graph G[J~]. 

2. 6 7 = 6 and A{6"<) = A{9) for all 6 G 6. 

3. F{x*\9~i) = T(x\6) for all x G X n , 6 G 6. 

4. xni{6) = m(6»T) for all e 9. 

5. X 7 =MandA*(^) = for all p e M. 

4 Lifted Variational Inference Framework 

We now discuss the principle of how to exploit the symmetry of the exponential family graphical model for 
lifted variational inference. In the general variational inference framework lfT31l . marginal inference is viewed 
as to compute the mean parameter p = m(0) given a natural parameter 9 by solving the optimization problem 

sup {6,p)-A*{p). (4.1) 

For discrete models, the variational problem is more conveniently posed using the overcomplete parameteriza- 
tion, for marginal inference 

sup (n°,0°) -A°*(p°) (4.2) 

and for MAP inference 

max In F{x\ff) = sup 9°) + const. (4.3) 

We first focus on lifting the main variational problem in ( |4.1| i and leave discussions of the other problems to 
subsection 14. 31 

4.1 Parameter Tying and Lifting Partition 

Lifted inference in essence assumes a parameter-tying setting where some components of 9 are the same. More 

precisely, we assume a partition A of I (called the parameter-tying partition) such that j ~ j => 9j = 9y . Our 
goal is to study how parameter-tying, coupled with the symmetry of the family J 7 , can lead to more efficient 
variational inference. 

Let denote the subspace jr G K m | r,- = ry if j ~ j'X. For any set S C « m , let S A = S H K^. 
Restricting the natural parameter to Ga is equivalent to parameter tying, and hence, equivalent to working 
with a different exponential family with |A| aggregating features ^X^eA <A?) ■ While this family has fewer 
parameters, it is not obvious how it would help inference; moreover, in working directly with the aggregation 
features, the structure of the original family is lost. 

To investigate the effect parameter tying has on the complexity of inference, we turn to the question of how 
to characterize the image of 0a under the mean mapping m. At first, note that in general m(9A) ^ Ma'- 
taking A to be the singleton partition {1} will enforce all natural parameters to be the same, but clearly this 
does not guarantee that all mean parameters are the same. However, one can hope that perhaps some mean 
parameters are forced to be the same due to the symmetry of the graphical model. More precisely, we ask the 
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following question: is there a partition tp of I such that for all 9 G 8a the mean parameter is guaranteed to 



lie inside Ai^, and therefore the domain of the variational problem (4.1 1 can be restricted accordingly to Ai 



lip ■ 



Such partitions are defined for general convex optimization problems below. 



Definition 4.1. (Lifting partition) Consider the convex optimization m/ x(El s J(x) where S C R ,n is a convex 
set and J is a convex function. A partition tp of {1 . . . in} is a lifting partition for the aforementioned problem 
iff inf x ^s J( x ) = i n fxes v J( x ), i- e -> the constraint set S can be restricted to S v . 

Theorem 4.2. Let G act on I = {1 . . . m}, so that every j£G corresponds to some permutation on {1 . . . m}. 
If S 9 = S and J(x 9 ) = J{x) for every j G G ( i.e., G stabilizes both S and J) then the induced orbit partition 
Orb(s(/) is a lifting partition for inf xe s J{x). 



From theorem 3.3 we know that A[J r ] stabilizes Ai and A* ; however, this group does not take the parameter 
9 into account. Given a partition A, a permutation A on 1 is consistent with A iff A permutes only among 
elements of the same cell of A. Such permutations are of special interest since for every G A , X = 9. If 
G is a group acting on X, we denote Ga the set of group elements whose actions are consistent with A, that 

is Ga = 1 9 € G|Vu G X, g(u) ~ u\. It is straightforward to verify that Ga is a subgroup of G. With this 
notation, A&(F) is the subgroup of A [J 7 ] whose member's action is consistent with A. The group A&(jF) 
thus stabilizes not just the family T, but also every parameter 9 G 8a- It is straightforward to verify A^(F) 



stabilizes both the constraint set and the objective function of (4.1 1. Therefore by the previous theorem, its 
induced orbit yields a lifting partition. 

Corollary 4.3. Let tp = tp(A) = Ort>A A [jr] (X). Then for all 9 G 8a, <p is a lifting partition for the variational 
problem \4.1\ , that is 

sup (6, ft) - A* (ft) = sup (6, ft) - A* (ft) (4.4) 



In (4.4 1, we call the LHS the ground formulation of the variational problem, and the RHS the lifted for- 
mulation. Let I — yp\ be the number of cells of tp, the lifted constraint set .M^then effectively lies inside an 
^-dimensional subspace where £ < m. This forms the core idea of the principle of lifted variational inference: 
to perform optimization over the lower dimensional (and hopefully easier) constraint set A4 V instead of Ai. 

Remark. The above result also holds for any subgroup G of Aa(^) since tpQ = Orbis(X) is finer than ip. 
Thus, it is obvious that ipc is also a lifting partition. However, the smaller is the group G, the finer is the lifting 
partition ipQ, and the less symmetry can be exploited. In the extreme, G can be the trivial group, ip^, is the 
discrete partition on I putting each element in its own cell, and A4 Va = Ai, which corresponds to no lifting. 



4.2 Characterization of M 9 

We now give a characterization of Ai v in the case of discrete random variables. Note that M. is the convex hull 
M. = conv {$(a;)|x G X n } which is a polytope in E m , and A [J 7 ] acts on the set of configurations X n by the 
permuting action of n which maps x H> x w . 

Theorem 4.4. Let O — 8rb AA [jr] (X n ) be the set of X -configuration orbits. For each orbit C € O, let $(C) = 
]Zry X^ec ^( x ) be the feature-centroid of all the configurations x in C. Then Ad^A.) = conv |$(C)|C G O}. 

As a consequence, the lifted polytope Ai v can have at most \0\ extreme points. The number of configu- 
ration orbits \0\ can be much smaller than the total number of configurations \X\ n when the model is highly 
symmetric. For example, for a fully connected graphical model with identical pairwise and unary potentials 
and X = {0, 1} then every permutation ir G S n is part of an automorphism; thus, every configuration with the 
same number of l's belongs to the same orbit, and hence \0\ = n + 1. In general, however, \0\ often is still 
exponential in n. We discuss approximations of A4 V in Section|5] 

A representation of the lifted polytope A4 V by a set of constraints in RM can be directly obtained from 
the constraints of the polytope Ai. For each cell tpj (j = 1, . . . , \ ip\) of <p, let pj be the common value of the 
variables jtij, i £ ipj. Let p be the orbit mapping function that maps each element i G I to the corresponding 
cell p(i) = j that contains i. Substituting pi by p p U) in the constraints of Ai, we obtain a set of constraints 
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in p (in vector form, we substitute p by Dp where Dij = 1 if i £ tpj and otherwise). In doing this, some 
constraints will become identical and thus redundant. In general, the number of non-redundant constraints can 
still be exponential. 

4.3 Overcomplete Variational Problems 



We now state analogous results in lifting the overcomplete variational problems (4.2 1 and (4.3 1 when X is finite. 
To simplify notation, we will consider only the case where features are unary or pairwise. As before, the group 
Aa [J 7 ] will be used to induce a lifting partition. However, we need to define the action of this group on the set 
of overcomplete features 1°. 

Recall that if (tt, 7) £ A[.F] then ir is an automorphism of the graphical model graph Q. Since overcomplete 
features naturally correspond to nodes and edges of Q, n has a natural action on 1° that maps v.t h-> ir(v):t and 
{u:t,v:t?} H> {n(u):t,w(v):t f }. Define ip° = ip°(A) = Orb AA [jFj(I°) to be the induced orbits of A A [T] on 
the set of overcomplete features. 



Corollary 4.5. For all 6 £ Oa, f° is a lifting partition for the variational problems (4.2 ) and (4.3 \ 



Thus, the optimization domain can be restricted to a which we term the lifted marginal polytope. The 
cells of (p° are intimately connected to the node, edge and arc orbits of the graph Q induced by Aa [J 7 ] ■ We now 
list all the cells of <p° in the case where X — {0, 1}: each node orbit v corresponds to 2 cells {v : t\v £ v} , t £ 
{0, 1}; each edge orbit e corresponds to 2 cells {{u : t,v : t}\ {u, v} £ e} ,t £ {0, 1}; and each arc orbit a 
corresponds to the cell {{u : 0, v : 1} \ (u, v) £ a}. The orbit mapping function p maps each element of 1° to 
its orbit as follows: p(y:t) — v.t, p({u:t, v.t}) = {TT/v}:t, p({u:0, v.l}) — (v^v):01. 

The total number of cells of ip° is 0(\ V\ + l-El) where \ V\ and \E\ are the number of node and edge orbits 
of Q (each edge orbit corresponds to at most 2 arc orbits). Thus, in working with the big-0 order of the 

number of variables is reduced from the number of nodes and edges in Q to the number of node and edge orbits. 



5 Lifted Approximate MAP Inference 

Approximate variational inference typically works with a tractable approximation of A4 and a tractable approx- 
imation of A*. In this paper, we focus only on lifted outer bounds of Ai° (and thus restrict ourselves to the 
discrete case). We leave the problem of handling approximations of A* to future work. Thus, our focus will be 
on the LP relaxation of the MAP inference problem (|4.3[). 



By corollary 4.5 (4.3 1 is equivalent to the lifted problem sup 0(E M o (8°,p°). Since any outer bound 
OUTER D M° yields an outer bound OUTER^o of D , we can always relax the lifted problem and replace 
Ai v o by OUTER^o. But is the relaxed lifted problem on OUTER^o equivalent to the relaxed ground problem 
on OUTER? This depends on whether ip° is a lifting partition for the relaxed ground problem. 

Theorem 5.1. If the set OUTER = OUTER{Q) depends only on the graphical model structure Q of T, then for 
all 6 £ a, <p° is a lifting partition for the relaxed MAP problem 

sup (6>V°) = sup (0°,p°) 

fj.°eOUTER ^"EOUTER^o 

The most often used outer bound of M° is the local marginal polytope LOCAL(C7) lfl5l . which enforces 
consistency for marginals on nodes and between nodes and edges of Q. Ifl3l [T4l used CYCLE(Cf), which 
is a tighter bound that also enforces consistency of edge marginals on the same cycle of Q. The Sherali- 
Adams hierarch}j^][ 1 1 1 provides a sequence of outer bounds of starting from LOCAL(tJ) and progressively 
tightening it to the exact marginal polytope A4°. All of these outer bounds depend only on the structure of the 
graphical model Q, and thus the corresponding relaxed MAP problems admit ip° as a lifting partition. Note that 
with the exception when OUTER = LOCAL, equitable partitions [5 | of Q such as those used in [9| are not 
lifting partitions for the approximate variational problem in theorem : 



5.1 



3 A note about terminology: Following the tradition in lifted inference, this paper uses the term lift to refer to the exploitation of symmetry 
for avoiding doing inference on the ground model. It is unfortunate that the term lift has also been used in the context of coming up with 
better bounds for the marginal polytopes. There, (as in lift-and-project) means to move to a higher dimensional space where constraints 
can be more easily expressed with auxiliary variables. 

4 As a counter example, consider a graphical model whose structure is the Frucht graph (http://en.wikipedia.org/wiki/Frucht_graph). 
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6 Lifted MAP Inference on the Local Polytope 



We now focus on lifted approximate MAP inference using the local marginal polytope LOCAL. From this 
point on, we also restrict ourselves to models where the features are pairwise or unary, and variables are binary 
(# = {0,1}). 

We first aim to give an explicit characterization of the constraints of the lifted local polytope LOCAL^o . 
The local polytope LOCAL(Cf) is defined as the set of locally consistent pseudo-marginals. 



T > 
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Substituting by the corresponding T p u\ where pQ is given in subsection 4.3 and by noting that constraints 



generated by {u, v} in the same edge orbits are redundant, we obtain the constraints for the lifted local polytope 
LOCAL^o as follows. 



f > 



Tv:0 + T~v 
00 + f fjj^;) 



1 = 1 

01 = T u 
01 = T v 
01 = T v 



V node orbit v 



in 



V edge orbit e 
(u, v), (v, it) : arc 
orbits of e 



::00 
!:11 

■:11 + T(vsl):t 

Thus, the number of constraints needed to describe the lifted local polytope LOCAL^o is 0(\ V \ + \E\). Similar 
to the ground problem, these constraints can be derived from a graph representation of the node and edge orbits. 
Define the lifted graph Q be a graph whose nodes are the set of node orbits V of Q. For each edge orbit e with 
a representative {u, v} £ e, there is a corresponding edge on Q that connects the two node orbits u and v. Note 
that unlike Q, the lifted graph Q in general is not a simple graph and can contain self-loops and multi-edges 
between two nodes. Figure 6.1 shows the ground graph Q and the lifted graph Q for the example described in 
subsection!3.2l 





a. Ground graph Q b. Lifted graph Q 

Figure 6.1: Q and Q of the example described in section 



3.2 



We now consider the linear objective function (6°, r). Substituting t; by the corresponding r p ^, we can 
rewrite the objective function in terms of f as (6 1 f \ where the coefficients 9 are defined on nodes and edges 
of the lifted graph Q as follows. For each node orbit v, 8 B . t = X^i/es @v'-t ~ l^l^° t wnere t G {0: 1} an d v is 
any representative of v. For each edge orbit e with a representative {u, v} £ e, Og-.tt — Yl{ u ' v'}&s @{u'-t v'-t} = 

\e\°{u:t,v:t} where {0,1},_%^):01 = E ( u',v>)e(u^) {u':0,v>:i} HMI^o,«:i}- Note that typically 
the two arc-orbits (Ti~~v) and (v^u) are not the same, in which case |(u7^)l = l( : ivw)l = l^l- However, in case 
(v~v) = (v^u) then |(u7U)| = |(«7^)| = 2|e|. 

So, we have shown that the lifted formulation for MAP inference on the local polytope can be described in 
terms of the lifted variables f and the lifted parameters 9. These lifted variables and parameters are associated 
with the orbits of the ground graphical model. Thus, the derived lifted formulation can also be read out directly 
from the lifted graph Q. In fact, the derived lifted formulation is the local relaxed MAP problem of the lifted 
graphical model Q. Therefore, any algorithm for solving the local relaxed MAP problem on Q can also be used 
to solve the derived lifted formulation on Q. From lifted inference point of view, we can lift any algorithm for 
solving the local relaxed MAP problem on Q by constructing Q and run the same algorithm on Q. This allows 



Since this is a regular graph, LOCAL approximation yields identical constraints for every node. However, the nodes on this graph participate 
in cycles of different length, hence are subject to different cycle constraints. 



7 



us to lift even asynchronous message passing algorithms such as the max-product linear programming (MPLP) 
algorithm [4|, which cannot be lifted using existing lifting techniques. 

7 Beyond Local Polytope: Lifted MAP Inference with Cycle Inequali- 
ties 

We now discuss lifting the MAP relaxation on CYCLE(Cf), a bound obtained by tightening LOCAL(Cf) with 
an additional set of linear constraints that hold on cycles of the graphical model structure Q, called cycle con- 
straints iTPJI . These constraints arise from the fact that the number of cuts (transitions from to 1 or vice versa) 
in any configuration on a cycle of Q must be even. Cycle constraints can be framed as linear constraints on the 
mean vector p° as follows. For every cycle C (set of edges that form a cycle in Q) and every odd-sized subset 
FCC ' " 

nocut({u, v}, r) + cut({u, v}, r) > 1 (7.1) 

{u,v}£F (ti ; ii}eC\F 

where nocut({u, v}, r) = r { „. ,„ : o} + r {tl:1 ^ :1} and cut({u, v}, r) = t {u:0 ^ :1} + t {u:0 ^ :1} . 

Theorem |5 . 1 1 guarantees that MAP inference on CYCLE can be lifted by restricted the feasible domain to 
CYCLE^o, which we term the lifted cycle polytope. Substituting the original variables r by the lifted variables 
f , we obtain the lifted cycle constraints in terms of f 

nocut({u, u}, r) + cut({u, v}, f) > 1 (7.2) 

{u,v}£F {u,v}£C\F 

where nocut({v^v}, f) = r {s ^ }:00 + f {s ^ }:11 and cut({u^u}, f) = f (s ^). 01 + %- E j ):01 where (u^v) and (v^u) 
are the arc-orbits corresponding to the node-orbit {u^v}. 

7.1 Lifted Cycle Constraints on All Cycles Passing Through a Fixed Node 

Fix a node i in Q, and let Cyc[i] be the set of cycle constraints generated from all cycles passing through i. A 
cycle is simple if it does not intersect with itself or contain repeated edges; JT3) considers only simple cycles, 
but we will also consider any cycle, including non-simple cycles in Cyc [i] . Adding non-simple cycles to the 
mix does not change the story since constraints on non-simple cycles of Q are redundant. We now give a precise 
characterization of Cyc[i], the set of lifted cycle constraints obtained by lifting all cycle constraints in Cyc[i] via 
the transformation from ( |7.1[ ) to ( T7.2) . 

The lifted graph fixing i, Q[i] is defined as follows. Let Aa [F, i] be the subgroup of Aa [J 7 ] that fixes i, that 
is tt(«) = i. The set of nodes of Q[i] is the set of node orbits V[i] of Q induced by Aa [J 7 , i], and the set of edges 
is the set of edge orbits E\i\ of Q. Each edge orbit connects to the orbits of the two adjacent nodes (which could 
form just one node orbit). Since i is fixed, {i} is a node orbit, and hence is a node on Q[i]. Note that Q[i] in 
general is not a simple graph: it can have multi-edges and loops. 

Theorem 7.1. Let C be a cycle (not necessarily simple) in Q[i] that passes through the node {i}. For any 
odd-sized F C C 

nocutje, f) + cut(e, t) > 1 (7.3) 

e£F eeC\F 

is a constraint in Cyc[i]. Furthermore, all constraints in Cyc[i] can be expressed this way. 

7.2 Separation of lifted cycle constraints 

While the number of cycle constraints may be reduced significantly in the lifted space, it may still be computa- 
tionally expensive to list all of them. To address this issue, we follow |[T3l and employ a cutting plane approach 
in which we find and add only the most violated lifted cycle constraint in each iteration (separation operation). 

For finding the most violated lifted cycle constraint, we propose a lifted version of the method presented 
by fOl . which performs the separation by iterating over the nodes of the graph Q and for each node i finds the 
most violated cycle constraint from all cycles passing through i. Theorem 7.1 suggests that all lifted cycle 
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constraints in Cyc[z] can be separated by mirroring Q[i] and performing a shortest path search from {i} to its 
mirrored node, similar to the way separation is performed on ground cycle constraints lfT3ll . 

To find the most violated lifted cycle constraint, we could first find the most violated lifted cycle constraint 
Ci in Cyc[i] for each node i, and then take the most violated constraints over all C\. However, note that if i 
and i 1 are in the same node orbit, then Cyc[z] = Cyc[i']. Hence, we can perform separation using the following 
algorithm: 

1. For each node orbit v G V, choose a representative i G v and find its most violated lifted cycle constraint 
Cy G Cyc[i] using a shortest path algorithm on the mirror graph of Q[i], 

2. Return the most violated constraint over all C s . 

Notice that both Q [i] and its mirror graph have to be calculated only once per graph. In each separation iteration 
we can reuse these structures, provided that we adapt the edge weights in the mirror graph according to the 
current marginals. 



8 Detecting Symmetries in Exponential Families 

8.1 Detecting Symmetries via Graph Automorphisms 

We now discuss the computation of a subgroup of the automorphism group A a (J 7 ) ■ Our approach is to construct 
a suitable graph whose automorphism group is guaranteed to be a subgroup of Aa(.F), and thus any tool and 
algorithm for computing graph automorphism can be applied. The constructed graph resembles a factor graph 
representation of T . However, we also use colors of factor nodes to mark feature functions that are identical 
and in the same cell of A, and colors of edges to encode symmetry of the feature functions themselves. 

Definition 8.1. The colored factor graph induced by T and A, denoted by is a bipartite graph with 

nodes V(<8) = {x x . ..x n } U {fi . . .f m } and edges E(<5) = {{^(fc), /i} | i G 1, k = l... \r]i\}. Variable 
nodes are assigned the same color which is different from the colors of factor nodes. Factor nodes fj and fj have 

the same color iff fj = fj and i ~ j. If the function fj is symmetric, then all edges adjacent to fj have the same 
color; otherwise, they are colored according to the argument number of /j, i.e., {x v .t)-)^i} is assigned the fc-th 
color. 

Theorem 8.2. The automorphism group A[©a] of ©aI-T 7 ] is a subgroup of A&(J-), i.e., A[©a] < Aa[^]. 
Finding the automorphism group A[0a] of the graph ©aI-7 7 ] therefore yields a procedure to compute a 



subgroup of Aa[^]- Thus, according to corollary 4.3 the induced orbit partition on the factor node of ©a[-7"1 



is a lifting partition for the variational problems discussed earlier. Nauty, for example, directly supports the 
operation of computing the automorphism group of a graph and extracting the induced node orbits. 



8.2 Symmetries of Markov Logic Networks 

A Markov Logic Network (MLN) [ 10 1 is prescribed by a list of weighted formulas F\ . . . Fk (consisting of a 
set of predicates, logical variables, constants, and a weight vector w) and a logical domain T> = {ai...am\}. 
Let 2? t> e the set of objects appearing as constants in these formulas, then T>* = T)\D is the set of objects 
in T) that do not appear in these formulas. Let Gr be the set of all ground predicates p(a\ . . . a^)'s. If s is a 
substitution, Fi [s] denotes the result of applying the substitution s to Fi and is a grounding of Fi if it does not 
contain any logical free variables. The set of all groundings of Fi is GrF;, and let GrF = GrFi U . . . U GrFjf. 
The MLN corresponds to an exponential family J~mln where Gr is the variable index set and each grounding 
Fi[s] G GrFi is a feature function (j) F .[ s ](uj) = I(w N Fi[s]) with the associated parameter 9p.i s -] = Wi where 
to is a truth assignment to all the ground predicates in Gr and Wi is the weight of the formula Fi. Since all 
the ground features of the formula Fi have the same parameter Wi, the MLN also induces the parameter-tying 

partition A M ln = {{(f> Fl [ s ](^)} ■ ■ ■ {(/>f k [s] Mil- 
Let a renaming permutation r be a permutation over T> that fixes every object in T>q, i.e., r only permutes 
objects in T>* . Thus, the set of all such renaming permutations is a group G re that is isomorphic to the symmetric 
group §(2?*). Consider the following actions of <G re on Gr and GrF: ir r : p(a\ . . . ai) p(r(ai) . . . r(at)) 
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and 7 r : F^s] n- Fi[r(s)] where r(s — (x\/a\, x k /ak)) = {x\/r(d\), Xf./r{a}-)). Basically, 7r r and 7 r 
rename the constants in each ground predicate p(ai . . . ai) and ground formula Fi[s] according to the renaming 
permutation r. The following theorem (a consequence of Lemma 1 from Bui et al. [ 1 ]) shows that G re is 
isomorphic to a subgroup of A[Fmln], the automorphism group of the exponential family Fmln- 

Theorem 8.3. For every renaming permutation r, (n r , 7 r ) G A[Fmln]- Thus, G re ^ A[Fmln]- 

Furthermore, observe that -f r only maps between groundings of a formula Fj, thus the action of G re on 
GrF is consistent with the parameter-tying partition Amln — {{<^Fi[sl ■ • ■ {0f k [s](w)}}. Thus, <G re ^ 



Aamln [-Fmln]- According to corollary 4.3 the orbit partition induced by the action of G re on GrF is a lifting 
partition for the variational inference problems associated with the exponential family Fmln- In addition, 
this orbit partition can be quickly derived from the first-order representation of an MLN; the size of this orbit 
partition depends only on the number of observed constants \T> \, and does not depend on actual domain size 
W\. 



9 Experiments 

We experiment with several propositional and lifted methods for variational MAP inference by varying the 
domain size of the following MLN: 

w\ x^yAx^zAy^z^ pred (x, y) pred (y, z) 
u> 2 x^y A obs (x, y) => pred (x, y) 

obs(A,S) 

This MLN is designed to be a simplified version of models that enforce transitivity for the predicate pred, and 
will be called the semi-transitive modelj^J We set the weights as w\ — —100 and w 2 — 0.1. The negative 
w\ yields a repulsive model with relatively strong interaction, while the shared predicate and variables in the 
first formula are known to be a difficult case for lifted inference. The third formula is an observation with two 
constants A and B. 

The ground Markov network of the above MLN is corresponding to an exponential family Fmln, and we 
use the two methods described in Sections |8. 1 1 and [8^2] to derive lifting partitions. The first method (nauty) 



fully grounds the MLN, then finds a lifting partition using nauty. The second (renaming) works directly with 
the MLN, and uses the renaming group to find a lifting partition. We use two outer bounds to the marginal 
polytope: LOCAL and CYCLE. There are three variants of each method: propositional, lifting using nauty 
orbit partition, and lifting using renaming orbit partition. This yields a total of six methods to compare. For 
reference, we also calculate the exact solution to the MAP problem using ILP 



Figure 9.1a shows the runtime (in milliseconds) until convergence for different domain sizes of the logical 
variables in our MLN. We can make a few observations. First, in most cases lifting dramatically reduces run- 
time for larger domains. Second, nauty-based methods suffer from larger domain sizes. This is expected, as 
we perform automorphism finding on propositional graphs with increasing size. Third, the renaming partition 
outperforms nauty partitions, by virtue of working directly with the first-order representation. Notice in partic- 
ular for lifted-via-renaming methods, we can still observe a dependency on domain size, but this is an artifact of 
our current implementation — in the future these curves will be constant. Finally, all but the propositional cycle 
method are faster than ILP. 



Figure 9.1b illustrates how the objective changes over cutting plane iterations (and hence time), all for the 
case of domain size 10. Both the local polytope and ILP approaches have no cutting plane iterations, and hence 
are represented as single points. Given that ILP is exact, the ILP point gives the optimal solution. Notice how 
all methods are based on outer/upper bounds on the variational objective, and hence are decreasing over time. 
First, we can observe that the CYCLE methods converge to the (almost) optimal solution, substantially better 
than the LOCAL methods. However, in the propositional case the CYCLE algorithm converges very slowly, 
and is only barely faster than ILP 



5 If pred(x, y) = 1 is interpreted as having a (directed) edge from x to y, then this model represents a random graph whose nodes are 
elements of the domain of the MLN. More specifically, the model can be thought of as a 2-star Markov graph (3). 
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(a) Runtime vs. domain size. 




1e+02 1e+03 1e+04 1e+05 1e+06 



(b) Objective over time for domain size 10. 

Figure 9.1: Experiments with a semi-transitive model with one observed variable. Due to large differences 
between runtimes, time is always presented in logarithmic scale. 

Lifted CYCLE methods are the clear winners for this problem. We can also see how the different lifting 
partitions affect CYCLE performance. The renaming partition performs its first iteration much quicker than 
the nauty-based partition, since nauty needs to work on the full grounded network. Consequently, it converges 
much earlier, too. However, we can also observe that the renaming partition is more fine-grained than the 
nauty partition, leading to larger orbit graphs and hence slower iterations. Notably, working with lifted cycle 
constraints gives us substantial runtime improvements, and effectively optimal solutions. 



10 Conclusion 

We presented a new general framework for lifted variational inference. In doing this, we introduce and study 
a precise mathematical definition of symmetry of graphical models via the construction of their automorphism 
groups. Using the device of automorphism groups, orbits of random variables are obtained, and lifted variational 
inference is materialized as performing the corresponding convex variational optimization problem in the space 
of per-orbit random variables. Our framework enables lifting a large class of approximate variational MAP 
inference algorithms, including the first lifted algorithm for MAP inference with cycle constraints. We presented 
experimental results demonstrating that lifted MAP inference with cycle constraints achieved the state of the 
art performance, obtaining much better objective function values than LOCAL approximation while remaining 
relatively efficient. Our future work includes extending this approach to handle approximations of convex upper- 
bounds of A*, which would enable lifting the full class of approximate convex variational marginal inference. 
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11 Proofs 



Proof of proposition |3]2j 

Proof. (Part 1) We first prove that if (tt, 7) G A [J 7 ] then the conditions in the theorem hold. Pick i e I and 
let j(i) — j. Since <f> 7 (a;) = $(2;^), 4>j(x) — <fii(x' n '). Express the feature <f>i and (f>j in their factorized forms, 
we have ij(xj 1 . . .Xj ]v l ) = ii(x n ^ . . -x n ^. ,)). Since f 3 cannot be reduced further, it must depend on all 
the distinct arguments in {ji . . . j\ nj \}- This implies that the set of arguments on the LHS {tt(h) ■ ■ • Tr(i\ Vi \)} D 
{ji ■ ■ ■ Thus \r)i\ > \rjj\. Apply the same argument with the automorphism (tt^ 1 and note that 

7~ 1 (j) = i, we obtain \rjj\ > \r]i\. Thus \rji\ — \rjj\ = K. Furthermore, {tt(ii) ■ ■ ■ tt(ik)} = {ji ■ ■ -3k\- This 
implies that 7r is a bijection from scope(ii) — {i\ . . . %k\ to scope(if) — {ji . . . Jk\- 

For the third condition, from tj(xj 1 . . . Xj^ . ) = h( x -n{i t ) ■ ■ ■ x^^^), we let t k = Xj k so that t -t,^ — x k 

(since j k = f] 3 (k)) to arrive at f,-(*a ■ • ■ t K ) = f i (*,,.- 1 0^(1) ■ • ■ ^o-k^k))' or in short form f j(*) = f ^")- 
a is a bijection since all the mappings r)j, r\i and tt are bijections. 

(Part 2) Let (tt, 7) be a pair of permutations such that the three conditions are satisfied, we will show that 
they form an automorphism of T . Pick i 6 1 and let j = and K = \rji\ = \r)j\. From fj(i) = U(t a ), 
wehavef j (x jl ...x jK ) = h(x ]a{1) . . . x ja(K) ). Note that j a(fc) = rjj o a(k) = ir(i k ). Thus ij{xj t . . . x jK ) = 
f< ■■■ x <x(i K ))> so0j(af) = ^(2^) and hence = □ 

Proof of theorem 1331 

Proof. Part (1) To prove that 7r is an automorphism of Q, the hypergraph representing the structure of the 
exponential family graphical model, we need to show that c C V is a hyperedge (cluster) of Q iff tt(c) is a 
hyperedge. 



If c is a hyperedge, 3i £ X such that c = scope(ii). Let j = 7(1), by proposition 3.2 7r(c) = scope(fj), so 
7r(c) is also a hyperedge. 

If 7r(c) is an hyperedge, apply the same reasoning using the automorphism (ir^ 1 , 7 _1 ), we obtain 7r _1 (7r(c)) = 
c is also a hyperedge. 

Part (2)-(5) We first state some identities that will be used repeatedly throughout the proof. Let x, y € K n . 
The first identity states that permuting two vectors do not change their inner products 

(x,y) = (x*,y 1 <) (11.1) 

As a result if (tt, 7) G A[J"] 

(^(a:*),^) = /^~\x w ),9^ = ($(x),9) (11.2) 
The next identity allows us to permute the integrating variable in a Lebesgue integration 

/(x)dA= / f(x*)d\ (11.3) 



where A is a counting measure, or a Lebesgue measure over W l . The case of counting measure can be veri- 
fied directly by establishing a bijection between summands of the two summations, and the case of Lebesgue 
measure is direct result of the property of linearly transformed Lebesgue integrals (Theorem 24.32, page 616 

Ml 

Part (2). By definition of the log-partition function, 



X" 



X" 



A(8 r ) = / h(x)cxp(^(x),9' 1 )d\ 

h(x n ) exp ($(x"), 6> 7 ) d\ (by [TO) 
h(x) exp ($(x), 9)) dX (by[TL2l> 



A'" 



As a result, 6 7 = {9 J \A(9) < 00} = {0t|^(0t) < 00} = 9. 
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Part (3). F(x"\6rt) = h(x*) exp (®{x v ), 7 ) = h(x)exp{$(x),6) = F{x\9). 
Part (4). Expand m 7 (#) gives 



m 7 ((9) =E e $~<(x) = / $~<{x)J r (x\6)d\ = / S- 7 ^ ^(x* |0)dA 
JX" Jx™ 



where the last equality follows from ( 11.3 I. Since (n 1 ,7 1 ) is also an automorphism, $(x* ) = $ 7 (x), 
thus $ 7 (x 7r_1 ) = $(x). Further, by part (3), J 7 ^ 1 16>) = F{x\9~<). Thus m 7 ((9) = | A ,„ $(ar) J"(x|0 7 ) = 
m(0 7 ). 

Part (5). Let fi 6 .M, so /i = J-,„ p(a;)$(a;)(iA for some probability density p. Expand /i 7 gives 

/J 7 =/ p{x)^(x)d\= [ p(x)$(x*)d\ = I p{x 7T ' 1 )^{x)dX 
Jx n Jx n Jx n 

Letp'(x) — p(x rr ) and observe that J p'{x)d\ — j p(x)d\ = 1, so p' is also a probability density. Thus 
/i 7 E M., hence M 1 C M.. Apply similar reasoning to the automorphism (7r _1 ,7 _1 ), we have /i 7 1 € M.. 
Thus, every fj, E M can be expressed as /i' 7 for some fj,' E A4, but this means Ai C M. 1 . Thus, M. = M. 1 . 

For /i € rivW, there exists 9 E such that = m(0). The negative entropy function becomes = 
(H,6) - A{6). From part (4), /i 7 = m(<9 7 ), thus A*{^) = (/i 7 ,<9 7 ) - A(6» 7 ) = (^,0) - A(6>) = A*(fi). 

For e border M\nM, A* (^ 7 ) = A* (/x) holds by a continuity argument. □ 

Proof of theorem 1431 

Proof. The proof makes use of the orbit-stabilizer theorem, an elementary group-theoretic result which we 
describe below. 

Let G be a finite group acting on X, and i G X, Let orb(i) = € G s.t. g(i) = k} be the orbit 

containing i and let Stab(i) = {g E G|</(i) = i} be the stabilizer of i. The orbit-stabilizer theorem essentially 
states that the group G can be partitioned into |orb(i)| subsets G = \Jk£orh(i)Gk where Gk — {g £ G|g(i) = k} 
for each k E orb(i), and |G fe | = |Stab(i)|. Thus |G| = |orb(i)||Stab(i)|. 

As a consequence, we can simplify summation over group elements to an orbit sum 

l^i = E EW)) = ^)| E /(*) aw) 

1 'gSG 1 1 feeorb(i) g£G k 1 wl fcGorb(i) 

We now return to the main proof of the theorem. Note that infs J(x) = c is equivalent to Mx E S, J(x) > c 
and there exists a sequence {xr n \} C 5 such that J(x(„)) — >• c (c can be — oo). Clearly, J(x) > cVx E S v , so 
all we need to establish is a sequence {£(„)} C S v such that J(xf-,) — > c. 

Let x E S C R m . Since G stabilizes S, x 9 E S for all g E G. Define x* = Xl s eG x9 as trie symmetriza- 
tion of x. Since 5 is convex, x* E S. Since J is convex and G stabilizes J, J{x*) < py X) 9 eG J{ x§ ) = J{x). 



Consider one element of x* of the vector x* . Using ( 1 1 .4 1, we can express x* as the average of xu for all k 
in i's orbit 

g£G 1 Wi fceorb(i) 

so if i and j are in the same orbit, x* = x*. Thus, x* E S v . 

With the above construction, we obtain a sequence {£(„)} C S v such that c < J(a;? n j) < J(x(„)). Since 
J (£(„)) — > c, we also have J(a;LO — ?> c. Thus, inf^gs^ J(x) = c. □ 

Proof of corollary [43} 



Proof. Observe that the group Aa[^] stabilizes the set M., the function A*(/i) (theorem 3.3 part (6)) and the 



linear function (9, (j,) when the coefficient 9 E ©a- Thus, this result is a direct consequence of theorem 4.2 □ 
Proof of theorem S3] 



13 



Proof. Recall that if g G A a [J 7 ] then g = (tt, 7). The group Aa [J 7 ] acts on I by the permuting action of 7 and 
on V by the permuting action of tt. We thus write x 9 to denote x™, and & 9 (x) to denote $ 7 (x). 

Consider the symmetrization of $>(x), defined as = [a^jf]! S^gAa!-? 7 ] ( ^ )9 ( a; )- Using an argument 

11.5), $*(x) € M™. Clearly, G A-l, so € Jv^. One the other hand, since g G A [J"], 



similar to 

= <fr~W), so 



geA A [.FJ yeC(x) 



where we have used ( 1 1 .4 1 and C(x) = orb AA rF](x) is the orbit containing x. 

We now return to the main proof. From the above, we have $(C) G M v , so clearly conv {$(C)|C G 0} C 
A4 V . Now, let /i G then /1 = Yl x ex n p{ x )®{ x ) for some probability distribution p. Furthermore, /1 9 = /i 
for all g G Aa [J 7 ] ■ Thus 

M = £ = utW E £ P(-) $9 (-) - E lK*)*(C(*)) - E P( C )*( C ) 

where p(C) = YlyeC P(v)- Therefore, fi G conv |$(C)|C G O}, so A4 V C conv {$(C)|C G £>}. □ 
Proof of corollary |4.5| 

Proof. Let = £7 [J 7 ]. If 7r is an automorphism of Q then 7r induces a permutation on 1° which we denoted 
by 7r°. We proceed in two steps. Step (1): if tt G A[Q] then (71", 7r°) G A[.F ] where T° is the overcomplete 
family induced from F; this guarantees that Aa[^], via the action ir° stabilizes M° and A° . Step (2): if 
(71", 7) G Aa[^] and 9 G 0a then (9°Y = 9°; this guarantees that AaI-7 7 ] stabilizes the linear function 
{9°, again via the action n°. These two steps together with theorem |4~2| will complete the proof. 

Step (1). Recall that n°(u : t) = tt(u) : t and ir°({u :t,v: t'}) = {tt(» : t,ir(v) : t'}. Note that tt° is 
well-defined only if tt is an automorphism of Q. We will show that ^ {x w ) = ($°(x)) 7r . Indeed 

4>l A {x*) = i{x Au) =t} = 4>l {u) .Ax) 

<i>{u*,v:t'}( x *) = I {Xn(u) =t,X„ {v) = t'} = 0{ 7r(u);t , 7r( „ ):t / } (x) 

Step(2). Note that if (tt, 7) G AfJ 7 ] then 7 is abijection between {i\scope(4>i) = S} and {j\scope(<pj) = ir(S)}. 
Furthermore, if (tt, 7) G Aa[.F] then OjU) — 9 t for all i G I. 



For u G V 



E u(t)0i= E f 7(*)(*)^w 

i scope(cpi) — {u} i scope((pi) — {u} 

E m^=K { uy.t 

3 scope(cl>j)={iT(u)} 



where fj(t) = £y(i)(t) follows from proposition 3.2 



For {it, u} G E(Q), without loss of generality, assume u < v. Take i el such that scope(cj>i) = {it, u}. By 
proposition 3.2 if 7r(u) < ir(v) then fj(t, i') = f 7 (^ (t, t') and 

°{u:t,v.t'} = E h{t^)9i= 2J f 7M(*> i ') 6l 7M 

i scope((pi) — {u,v} i scope{(j)i) — {u : v} 

= E fyCMOfy = 0{7r(u):i,7r(«):t'} 

j scope{(f)j) — {7r(u),7r(f)} 



If 7r(w) > 7r(v) then by proposition |3.2| fj(t, t') = f 7 (,-)(i',i) and 

°{u:t,v.t'} = E k{t,t')9 l = £ £y(i)(*':*)0 7 (i) 

2 scope{4>i) — {u,v} i scope((pi) — {u,v} 

j scope{4>j ) — {7r(ii) ,7r(i>)} 



□ 
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Proof of theorem [57T1 



Proof. From the proof of corollary 4.5 Aa[-7""] stabilizes the objective function (6°, fi°), so it remains to show 
that this group also stabilizes the set OUTER. 

We first elaborate on what it means in a formal sense for OUTER to depend only on the graph Q. The 
intuition here is that the constraints that form OUTER are constructed purely from graph property of Q, and 
not from the way we assign label to nodes of Q. Formally, let Iouter(t, G) be the indicator function of the set 
OUTER: given a pair (r, Q), this function return 1 if r belongs to OUTER(C* ) and otherwise. Relabeling Q by 
assigning the index 7r(u) to the node u for some 7r e §„, we obtain a graph Q' — Q v isomorphic to Q. Reassign 
the index of r accordingly, we obtain ' . Since construction of OUTER is invariant w.r.t. relabeling of Q, 
we have /outer (t, G) = /outer (t"", G 7 *)- 

If 7T is an automorphism of G, /outer(t, G) = Zouter^" , G), so r e OUTER(^) <^ T rr ° G OUTER(£). 



Thus the group A(G) stabilizes OUTER(Cf). From theorem 3.3 if (ir, 7) 6 A[T] then ir is an automorphism of 



g. Thus, A A [J 7 ] also stabilizes OUTER(£). □ 
Proof of theorem |7?T1 



Proof. Clearly every lifted cycle constraint in Cyc[i] can be rewritten in form (7.3 I. We now show that every 



constraint in this form is a lifted constraint in Cyc[£]. To do this, for every cycle C passing through {i} and 



every odd-sized F C C, we will point out a constraint in Cyc[i] whose lifted form is of the form (7.3 I. 

We first show that if e is an edge orbit connecting two node orbits u and v, then for any u G u, there exists 
an edge e = {u, v} such that e e e and v G v. Let {u 0l v } be an arbitrary member of e such that uq E u and 
«o G v. Since u and w are in the same node orbit, there exists a group element g such that g(u ) = u. Take 
v = g(vo), then clearly e = {u, v} satisfies e G e and v G v. 

Using the above, it is straight forward to prove a stronger statement by induction. If p = ei, . . . ,e n is a 
path in § [i] from node orbit u to v, and let u £ u, then there exists a path p = e% , . . . , e n in g from node u to u 
such that ej G e 3 for all j, and i> G v. 

A cycle in passing through {i} is a path C = e%, . . . , e n from {{} to {i} itself. Thus, there must exist 
a path C = ei, . . . , e n in g from i to i (so that C is a cycle in G passing through i), and ej G e^. Thus, take 



an arbitrary constraint of the form (7.3 I, there exists a corresponding ground constraint on the cycle C passing 



through i in G, and this constraint clearly belongs to Cyc[£]. □ 
Proof of theorem IOI 

Proof. Since &a is a bi-partite graph and variable and factor nodes have different colors, an automorphism of 
A must have a form of a pair of permutation (ir, 7) where ir G § n is a permutation among variable nodes and 
7 G S m is a permutation among factor nodes. 

Let j — Since i and j have the same color, j ~ This shows that 7 is consistent with the partition A. 

We now show that (ir,j) is an automorphism of the exponential family T. To do this, we make use of 



proposition 3.2 From the coloring of ©a we have f, = fj. Since 7r maps neighbors of i to neighbors of j, n 
must be a bijection from scope(fi) to scopeljj). Let a = t^ 1 o 7r o tj„ we need to show that ii(t a ) = fj(t) for 
all t. There are two cases. 

(i) If fj is a symmetric function, so is L and thus fi(i Q ) = fi(t) = fj(t). 

(ii) If fj is not a symmetric function, since n must preserve the coloring of edges adjacent to i and j, it must 
map fj's k-th argument to f/s fc-th argument: 7r(T]i(k)) = rjj{k). Therefore a(k) = r)J 1 (r]j(k)) — k, so a is 
the identity permutation. Thus, U{t a ) — fj(t) = fj(t). □ 

Proof of theorem IOI 

Proof. Let r be a renaming permutation, and let w be a Herbrand model. Let r(uj) denote the Herbrand 
model obtained by applying r to all groundings in lu. Using lemma 1 from (TJ, we have ui \= Fk(s) iff 
r(oj) \= Fk(r(s)). Writing u as a vector of or 1, where 1 indicates that the corresponding grounding is true, 
then r(ui) in vector form is the same as w 77 '- \ e.g., the vector ui permuted by n^ 1 . Thus, \= Ffe(s)} = 
I ^uj 7 ''- |= 7 r (Ffc(s))|, or equivalently, if <f> is the feature function of the MLN in vector form, then $(w) = 
$ 7r (u! 7r '- 1 ). Thus (7r r , 7 r ) is an automorphism of the MLN. □ 
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